From Universal Dependencies to Abstract Syntax
نویسندگان
چکیده
Abstract syntax is a tectogrammatical tree representation, which can be shared between languages. It is used for programming languages in compilers, and has been adapted to natural languages in GF (Grammatical Framework). Recent work has shown how GF trees can be converted to UD trees, making it possible to generate parallel synthetic treebanks for those 30 languages that are currently covered by GF. This paper attempts to invert the mapping: take UD trees from standard treebanks and reconstruct GF trees from them. Such a conversion is potentially useful in bootstrapping treebanks by translation. It can also help GF-based interlingual translation by providing a robust, efficient front end. However, since UD trees are based on natural (as opposed to generated) data and built manually or by machine learn-syntax is a tectogrammatical tree representation, which can be shared between languages. It is used for programming languages in compilers, and has been adapted to natural languages in GF (Grammatical Framework). Recent work has shown how GF trees can be converted to UD trees, making it possible to generate parallel synthetic treebanks for those 30 languages that are currently covered by GF. This paper attempts to invert the mapping: take UD trees from standard treebanks and reconstruct GF trees from them. Such a conversion is potentially useful in bootstrapping treebanks by translation. It can also help GF-based interlingual translation by providing a robust, efficient front end. However, since UD trees are based on natural (as opposed to generated) data and built manually or by machine learning (as opposed to rules), the conversion is not trivial. This paper will present a basic algorithm, which is essentially based on inverting the GF to UD conversion. This method enables covering around 70% of nodes, and the rest can be covered by approximative back up strategies. Analysing the reasons of the incompleteness reveals structures missing in GF grammars, but also some problems in UD treebanks.
منابع مشابه
An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملCross-Lingual Syntax: Relating Grammatical Framework with Universal Dependencies
GF (Grammatical Framework) and UD (Universal Dependencies) are two different approaches using shared syntactic descriptions for multiple languages. GF is a categorial grammar approach using abstract syntax trees and hand-written grammars, which define both generation and parsing. UD is a dependency approach driven by annotated treebanks and statistical parsers. In closer study, the grammatical ...
متن کاملAn Approach to use Executable Models for Testing
This paper outlines an approach to test programs by transforming them into executable models. Based on OMG’s metamodelling framework MOF in combination with an action language extension for the definition of operational semantics, we use QVT to transform abstract syntax trees as code representations into executable models. We argue that these models provide an adequate abstraction for simulatio...
متن کاملInvited Talk: The Case for Universal Dependencies
Universal Dependencies is a recent initiative to develop a linguistically informed, cross-linguistically consistent dependency grammar analysis and treebanks for many languages, with the goal of enabling multilingual natural language processing applications of parsing and natural language understanding. I outline the needs behind the initiative and how some of the design principles follow from ...
متن کاملDependency Annotation Choices: Assessing Theoretical and Practical Issues of Universal Dependencies
This article attempts to place dependency annotation options on a solid theoretical and applied footing. By verifying the validity of some basic choices of the current dependency reference framework, Universal Dependencies (UD), in a perspective of general annotation principles, we show how some choices can lead to inconsistencies and discontinuities, partly due to UD’s alternation between synt...
متن کامل[hal-00772522, v1] Relating nominal and higher-order abstract syntax specifications
Nominal abstract syntax and higher-order abstract syntax provide a means for describing binding structure which is higher-level than traditional techniques. These approaches have spawned two different communities which have developed along similar lines but with subtle differences that make them difficult to relate. The nominal abstract syntax community has devices like names, freshness, name-a...
متن کامل